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Abstract 

This paper presents stable, radix-2, completely recursive dis¬ 
crete cosine transformation algorithms DCT-I and DCT-III 
solely based on DCT-I, DCT-II, DCT-III, and DCT-IV having 
sparse and orthogonal factors. Error bounds for computing 
the completely recursive DCT-I, DCT-II, DCT-III, and DCT- 
IV algorithms having sparse and orthogonal factors are ad¬ 
dressed. Image compression results are presented based on 
the recursive 2D DCT-II and DCT-IV algorithms for image 
size 512 x 512 pixels with transfer block sizes 8 x 8 , 16 x 16, 
and 32 x 32 with 93.75% absence of coefficients in each 
transfer block. Finally signal flow graphs are demonstrated 
based on the completely recursive DCT-I, DCT-II, DCT-III, 
and DCT-IV algorithms having orthogonal factors. 

1 INTRODUCTION 

The Fast Fourier Transform is used to efficiently compute 
the Discrete Fourier Transform (DFT) and its inverse. The 
DFTs are widely used in numerous applications in applied 
mathematics and electrical engineering [27,23,24, 3,19, 30], 
etc. 

The DFT uses complex arithmetic. The DFT of a se¬ 
quence of ft-input {x^Zq is the sequence of ^-output 
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where CD„ = e~~^. There exist real analogues of the DFT, 
namely the Discrete Cosine Transforms and Discrete Sine 
Transforms, the main types are from I to IV. Similar to 
the I-IV variants of cosine and sine matrices transform the 
sequence of ft-input into a sequence of n-output via the 
transform matrices stated in Table 0 . where for DCT-I 
j,k = 0,1, • • • , n, DST-I 7 ,k = 0,1, • • • ,n — 2, DCT and DST 
H-IV j,k = o, 1 , • • • ,n — 1 , e„( 0 ) = e„(n) = ^=, e„(j) = 1 for 
7 G {1,2,•••,«—1} and n > 2 is an integer. Among DCT 
I-IV transformations, C ^ +1 was introduced in ll3TTl . C„ and its 
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Table 1: Cosine and Sine Transform Matrices 


inverse C 7// were introduced in Q. and C 1 / was introduced 
into digital signal processing in (9). Moreover, among DST 
I-IV transformations, S I n _ l and S 1 / were introduced in mm 
and S'J and its inverse S'J 1 were introduced in d. These 
classifications were also stated in l3Qlfl9l . 

It has been stated, in e.g. |2l] 22, 24], that these cosine 
and sine matrices of types I-IV are orthogonal. Strang, 
in m, proved that the column vectors of each cosine 
matrix are eigenvectors of a symmetric second difference 
matrix under different boundary conditions, and are hence 
orthogonal. Later Britanak, Yip, and Rao in 0 followed 
very closely the presentation made by Strang’s l24ll to 
point out that the column vectors of each cosine and 
sine matrix of types I-VIII are eigenvectors of a symmet¬ 
ric second difference matrix. Due to properties of these 
DCT and DST, it was shown by many authors (see e.g. 

that these 

symmetric and asymmetric (rarely used) versions of DCT 
and DST can be widely used in image processing, signal 
processing, finger print enhancement, quick response code 
(QR code), etc. 

To obtain real, fast DCT or DST algorithms one can 
mainly use a polynomial arithmetic technique or a ma¬ 
trix factorization technique. In the polynomial arithmetic 
technique (see e.g. 11251 ), components of C n x or S n x are 












































interpreted as the nodes of a degree n polynomial, and then 
one applies the divide and conquer technique to reduce the 
degree of the polynomial. Later it was found (see e.g. (26)) 
that the polynomial arithmetic technique leads to inferior 
numerical stability of the DCT and DST algorithms. The 
matrix factorization technique is the direct factorization of 
the DCT or DST matrices into the product of sparse matrices 
(see e.g. |30] |32] [3] M [HI). The matrix factorization for 
DST-I in (32) used the results in |[5) to decompose DST-I 
into DCT and DST. Also the decomposition for DCT-II in 
(30) is a slightly different version of the result in 0. Though 
one can find orthogonal matrix factorizations for DCT and 
DST in [30], the resulting algorithms in [30] are not com¬ 
pletely recursive, and hence do not lead to simple recursive 
algorithms. Moreover 0 has used the same factorization 
for DST-II and DST-IV as in (30). On the other hand, one 
can use these (301 13 Ell results to derive recursive, stable 
algorithms as stated in [19], 18j. 

However, CCD has offered stable, recursive DCT-II and 
DCT-IV algorithms, based on DCT-II and DCT-IV. Thus 
this paper completes the picture and provides completely 
recursive, stable, radix-2 DCT-I and DCT-III algorithms 
that are solely defined via DCT I-IV, having sparse and 
orthogonal factors. The paper also addresses the error bounds 
on computing completely recursive algorithms for DCT 
I-IV. Moreover, this paper elaborates image compression 
(absence of 93.75% coefficients in each transfer block) and 
signal transform designs based on the completely recursive 
algorithms based on DCT I-IV. 

In section [2] we derive factorizations for DCT-I and DCT-III 
having orthogonal and sparse matrices, and state completely 
recursive DCT I-IV algorithms solely defined via DCT I-IV 
having sparse, orthogonal, and rotation/rotation-reflection 
matrices. Next, in section [3j we present the arithmetic cost 
of computing these algorithms. In section [4] we derive error 
bounds in computing these algorithms and discuss the stabil¬ 
ity. Finally in sections [5] and [6] respectively, we demonstrate 
image compression results and signal flow graphs based on 
these completely recursive DCT I-IV algorithms. 


2 COMPLETELY RECURSIVE RADIX-2 
DCT ALGORITHMS HAVING ORTHOG¬ 
ONAL FACTORS 

This section introduces sparse and orthogonal factoriza¬ 
tions for DCT-I and DCT-III matrices. In the meantime, we 
present completely recursive, radix-2 DCT I-IV algorithms 
solely defined via DCT I-IV, having sparse, orthogonal, and 
butterfly matrices. One can observe a variant of the DCT-II 
and DCT-IV algorithms having almost orthogonal factors in 

m. 


The following notations and sparse matrices are used 
frequently in this paper. Denote an involution matrix I n 
by 4 x = [x n -\,x n - 2 , • • • Ao] r > a diagonal matrix D n by 
D n x = diag ((—l)*)J* =Q x, an even-odd permutation matrix 
P n (n > 3) by 

p x= f [xo,X2,-- ,x n -2,xi,X3,--- ,x n -i] T even n, 

I [*0A2,--' ,Xn-i,xi,X3,'-- ,x n - 2 ] r odd it, 
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for any x = [*/]” =0 , and orthogonal matrices (n > 4) by 
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DCT-II and DCT-IV algorithms are the keys for the com¬ 
pletely recursive procedure, so for a given vector x G M”, we 
present algorithms in order y = C 7/ x, y = C 1 / x, y = C 7// x 
and y = C 7 +1 x. Following the matrix factorizations for DCT- 
II and DCT-IV in JT9l . let us first state recursive DCT-II and 
DCT-IV having orthogonal factors via algorithms © and 
(2.2), respectively. 


Algorithm 2.1. (cos2(x,n)) 

Input: n = 2*(t >\), n\ = x E W 1 . 


1. Ifn = 2, then 

i [ 1 1 1 

y:= 7l[ 1 -1 J x ' 

2. If n> 4, then 

[ u j]j =0 = Hn X- 

zl : =cos2 ([u J } n i 'S 0 i ,«,) , 
z2:=cos4([M y -]"I^,ni), 
y:=Pj(zl r ,z2 r ) r . 
Output: y = C^ 7 x. 

























( 1 , 2 ) block becomes 


Algorithm 2.2. (cos4(x,n)) 

Input: n = 2 t {t > 1), n\ = x G ' 

1. Ifn = 2 , r/zezz 
cos 5 

y:= • & a x. 
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( 2 . 1 ) block becomes 

( 2 . 2 ) block becomes 


£„(fc)COS ^+ 1 ) fat 


J j,k=0 


w : = U„ (zl r ,z2 r ) , 
y:=^[w. 

Output: y = x. 

By using the well known transpose property between DCT- 
II and DCT-III we can state an algorithm for DCT-III via 
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(2.3). This algorithm executes recursively with the DCT-II 


and DCT-IV algorithms. 
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2. If ft > 4, r/zeft 
M yJj=o 

zl :=cos3 


[ W 7']/=0 
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z2 :=cos4 

y:=//J(zl r ,z2 r )V 
Output: y = C 777 x. 

Before stating the algorithm for DCT-I let us derive a 
sparse and orthogonal factorization for DCT-I. 

Lemma 2.4. Let n > 4 be an even integer. The matrix C 7 +1 
can he factored in the form 


Thus an algorithm for DCT-I can be stated via (2.5), which 
executes recursively with DCT II-IV algorithms. 

Algorithm 2.5. (cosl(x,n +1)) 

Input: ft = 2 t {t > 1), fti = x G M n+1 . 

7. Ifn = 2, r/zezz 


y := 


" 1 

1 

0 " 


" 1 

0 

1 " 

0 

0 

V 2 


0 

V 2 

0 

1 

-1 

0 _ 


1 

0 

-1 


2. If ft > 4, r/zezz 
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zl:s=cosl([«_ / ]"^ 0 ,ni + l), 
z2 js=cos3 ([m_/]" = „ 1+1 ,«i) , 
y: = ^n+i (zl r ,z2 r ) r . 
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Output: y = C 7 , x x. 


Proof Let’s apply P w +i to C 7 +1 to permute rows and then 
partition the resultant matrix. So 


( 1 , 1 ) block becomes 
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3 ARITHMETIC COST OF COMPUTING 
DCT ALGORITHMS 

We first calculate the arithmetic cost of computing DCT 
I-IV algorithms. Let’s denote the number of additions and 













































multiplications required to compute - say a length n DCT 
II algorithm: y = C 1 ^ x by #ft(DCT-II,ft) and #ftr(DCT-II,ft). 
Note that the multiplication of ±1 and permutations are not 
counted. Once the cost is computed we show numerical re¬ 
sults for the speed improvement factor of these algorithms. 


3.1 Number of additions and multiplications 
in computing DCT I-IV algorithms 

Here we calculate the arithmetic cost of computing the 
DCT I-IV algorithms in order ( |2.1| ), ( |2.2| ), (2.3) and (2.5). 
The cost of addition in computing DCT-II and DCT-IV al¬ 
gorithms is the same as in ED, but the cost of multiplica¬ 
tion is different from ED- The latter is because in this pa¬ 
per, not only DCT-I and DCT-III algorithms but also DCT-II 
and DCT-IV algorithms have orthogonal factors not almost 
orthogonal factors. Let us first derive explicitly the number of 
multiplications required to compute DCT-II and DCT-IV al¬ 
gorithms and then the arithmetic cost of DCT-III and DCT-I 
algorithms respectively. 

Lemma 3.1. Let n = 2* (t > 2) be given. Using algorithms 
O and \2.2\ , the arithmetic cost of computing length n 
DCT-II algorithm is given by 


#a(DCT-II,n) = \nt - \n- i(-l) f +1, 
#m(DCT-II,n) = (3) 

Proof Following algorithms and \2.2\ 

#m(DCT-II, n) = #m (DCT-II, § )+#m (DCT-IV, §) 

+ #m (H n ), 

#m (DCT-IV, n) = #m (U n ) +2 • #m (DCT-II, §) 

+ #m(R n ). 

(4) 


By referring to the structures of H n ,U n , and R n 


If #m (DCT-II, 2 f ) = a! (where a 7 ^ 0) is a solution then the 
above follows 

a! - a ' -1 - 2 (a!~ 2 ) = 5 • 2 f_1 - 2. ( 6 ) 

The homogeneous solution of the above is given by solving 
the characteristic equation 

a r_2 (a 2 -a-2) = 0. 


From which we get 

#m (DCT-II, 2 l ) = rff + r^[— 1/ + particular solution 

where r\ and r 2 are constants. Let a! = r^ + rtf • 2* (where r 3 
and r 4 are constants) be the particular solution. Substituting 
this potential equation into and equating the coefficients 
we can find that 

#m(DCT-II, 2‘) = n2‘+ r 2 (-l) r + yt-2 r + l 

Using the initial conditions #m (DCT-II, 2) = 2 and 
#m (DCT-II, 4) = 10, we can determine the general solution 

#w(DCT-II, 2‘) = | • t ■ 2 l - y 2 f + 1 (-iy + 1 (7) 

Thus substituting n = 2 f we can obtain the number of multi¬ 
plications required to compute DCT-II algorithm as stated in 

Again by algorithms <H3 and 
state 

#< 2 (DCT-II, n) = #a (DCT-II, ^ +2-#a (DCT-II, fj +2n-2. 

Since n = If the second order linear difference equation with 
respect to t can be given via 

#a(DCT-II,2 f ) - #a (DCT-II,2 /_1 ) -2-#a (DCT-II,2 f_2 ) 
= 2 ,+i - 2. 


2 . 2 ) together with (pb, we can 


#a (H n ) = n , #m (H n ) = n , 
#a (U n ) =n — 2, #m (U n ) —n — 2, 
#a (R n ) = ft, #m (R n ) = 2 ft, 


As derived analogously in the cost of multiplication, we 
can solve the above equation under the initial conditions 
#a (DCT-II, 2) = 2 and #a (DCT-II, 4) = 8 to obtain 


(5) 


Thus 

#m(DCT-II,ft) = #m (DCT-II, §) + 2-#m (DCT-II, fj 

+ fft-2. 

Since n = 2* we can obtain the linear difference equation of 
order 2 with respect to t 

#ftr(DCT-II, 2 ? ) - #m (DCT-II, 2 ?_1 ) -2 -#m (DCT-II, 2* -2 ) 

= 5 • 2 l ~ l — 2. 


4 8 1 

(DCT-II, ft) = -ftJ--ft--(-iy + l. ( 8 ) 
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Corollary 3.2. Let n = 2* (t > 2) be given. Using algorithms 
(22) and <113, the arithmetic cost of computing length n 
DCT-IV algorithm is given by 
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Proof. The number of multiplications required to compute 
DCT-IV algorithm can be found by substituting (|7j) at |(= 
2 f ~ l ) into the equation ([ 4 ]) 

#m (DCT-IV ,n) = n —2 + 2 • U -(t - 1) - ™ ^ 

+ i(-ir 1 +ij+2» 

Simplifying the above gives the cost of multiplication 

#m(DCT-IV,ft) = -nt + -n — -(—1)*. 

Similarly, the number of additions required to compute DCT- 
IV algorithm can be found by substituting |8j) at |(= 2 l ~ l ) 
to 

#a (DCT-IV, n) = #a (U n ) + 2 -#a (dCT-II, |) + #a (R n ) 
= 2-#a (dCT-II, ^ ) + 2n - 2. 
Simplifying the above yields 

4 2 2 

#a (DCT-IV, ft) = - nt--n+-(-iy. 


Proof Referring to the DCT-I algorithm ( |2.5| ) 

#a (DCT-I,« + 1) =#a (dCT-I, ^ + l) +#a (dCT-III, 'fj 
+ #a (H n . |_i) 

(12) 

The structure of H n +1 leads to #a (H n + 1) = n. This together 
with the arithmetic cost of computing DCT-III ( |T()| ) algorithm, 
we can rewrite © 

#a (DCT-I, ft + 1) =#a ^DCT-I, ^ + 1^ + -nt — -n 

Since n = 2f the above simplifies to the first order linear dif¬ 
ference equation (respect to t > 2) 

#a (DCT-I, 2‘ + 1 )-#a (DCT-I,2'^' + l) 

2 1 1 (13) 

= y- 2? -9 2< +9 (-o'+i 


We can obtain the number of additions required to compute 
the DCT-I algorithm by solving ( [13] ) under the initial con¬ 
dition #a (DCT-I, 3) = 4. Analogously, one can solve the 
first order linear difference equation under the initial condi¬ 
tion #m (DCT-I, 3) =5 to obtain the number of multiplica¬ 
tions. □ 


□ 


The DCT-III algorithm (2.3) was stated using the transpose 
property of matrices so the following corollary is trivial. 


Corollary 3.3. Let n = 2* (t > 2) be given. If DCT-III could 
be computed by using algorithms (2.3), (2.2), and © then 
the arithmetic cost of computing a length n DCT-III algorithm 
is given by 


#a(DCT-III,n) = \nt- \n- i(-l) ? + l, 
#m(DCT-III,n) = \nt - fn+ g(-l) f +1. (10) 


Remark 3.4. By using the DCT-III algorithm (2.3) and the 
arithmetic cost of computing the DCT-IV algorithm (in corol¬ 
lary ( |3.2| )), one can obtain the same results as in corollary 

([331. 


Let us state the arithmetic cost of computing the DCT-I algo¬ 
rithm dO). 


Lemma 3.5. Let n = 2* (t > 2) be given. Using algorithms 
) and the arithmetic cost of a DCT-I 
algorithm of length n + 1 is given by 


(2.5), (2.3), (2.2 


ffft ( DCT-I ^ ft -f 1) — jnt — iyft ~\~ yg (—If 1 ^ 

#m (DCT-I, n + 1) = ^nt — ^-n — yg(—1)* +t + ^ (11) 


3.2 Speed improvement factor of DCT I-IV al¬ 
gorithms 

Based on the results in lemmas |3.1[ |3.5| and corollaries 
|3.2| |3.3| we graph the speed improvement factor of DCT 
I-IV algorithms having orthogonal factors. It is known that 
the speed improvement factor plays a critical role in the 
DFT algorithms as it gives us an idea about the processing 
speed of the algorithms. We should recall here that this factor 
increases with the size of matrix. 

In our case, the speed improvement factor says the 
ratio between the number of additions and multiplications 
required to compute the DCT I-IV algorithms, and the direct 
computation cost of computing these algorithms which is 
2ft 2 — ft for DCT II-IV, and 2ft 2 + 3ft + 1 for DCT-I. Figure 
[T] shows the speed improvement factor corresponding to the 
DCT I-IV algorithms with respect to the size of matrix. These 
numerical data correspond to MATLAB (R2014a version) 
with machine precision 2.2 x 10 -16 . 

4 ERROR BOUNDS AND STABILITY OF 
DCT ALGORITHMS 

Error bounds and stability of computing the DCT I-IV al¬ 
gorithms are the main concern in this section. Here, to verify 
the stability, we will use error bounds (using perturbation of 












Figure 1: Speed improvement factor of DCT I-IV algorithms 


the product of matrices stated in 0) in computing these algo¬ 
rithms. Let us assume that the computed trigonometry func¬ 
tions (i d r := sin ^ or cos ^ are the entries of the butterfly 
matrix) d r are used and satisfy 

d r = d r + £ n |^r| — P (14) 

for all r = 1,3,5, • • • ,n — 1, where p := Off) and u is the unit 
roundoff. 


Let’s recall the perturbation of the product of matrices 
stated in 0 i.e. if A k + AA k G R nxn satisfies \AA k \ < 8 k \A k \ 
for all k, then 


m 
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]~[ (A* + AAk) - P[ Ak 

k =0 k =0 


- ri( i+8 *) _i fi 


,k =0 


k=0 


Ah 


where S^| < u. Moreover,recall ]^[(l-f8^) ±1 = l + 0„ where 

k= l 

K\ <T^=-'Vn and y k + u < y*+i, Tfc + 7/ + YitYj < Tfc+j 

from 0. 


Let us derive error bounds for computing recursive DCT 
I-IV algorithms with the help of the perturbations in a matrix 
product. 


Theorem 4.1. Let y = //(C 77 x), where n = 2 t (t> 2), be com¬ 
puted using the algorithms \2.1\ , \2.2\ , and assume that 
holds, then 


l|y-y|| 2 < 77(7-1) 
I|y|| 2 _ i-Y7(7-i)' 


(15) 


Proof. Using the algorithms and the computed 

matrices (in terms of the computed d r ) for k = 1,2, • • • , f — 
2 : 


y = fl (pj P[Fi P^F 2 • • • Pf_ 2 F,_ 2 C,_! G,_ 2 ■ • • G 2 G! G 0 x) 

= PjP[(Fi+AFi)-Pf_ 2 (F,_ 2 + AF ( _ 2 )(C r _i+AC,_i) 

(Gt-2 + AG^-2) • • • (G 2 + AG2)(Gi + AGi)(Gq + AGq)x 


Each Ffr is formed containing a combination of matrices Ijl and C/^. 
Using the fact that each row in has at most two non-zero entries 
with mostly ones per row: 

|AFjt| < y 2 |Fjtj for k= 1,2, •• • ,t — 2 

Also each G^ is formed containing a combination of matrices Hjl 
and Rjl except Gq = H n . Using the fact that each row in G^ has at 
most two non-zero entries per row: 

|AG 0 | < 72 |G 0 |, |AG^| < y 3 |g^| , 
for k = 1,2 • • •, t — 2 

G t -\ is a block diagonal matrix containing C 2 7 and C 2 V hence 
|AC,_!| <y 3 IQ-il 

Using direct call of computing trigonometric functions i.e. the view 
of ( pA] >, 

Gjfc = G^ + AG^, \AG k \<p\G k \ : 

Thus, overall 

? = Po P[ (Fl + AFi) • • • P ( r _ 2 (F ( _ 2 + AF ( _ 2 ) (C/_i + AC ( _i) 
(G,_ 2 + E,_ 2 ) • ■ • (G 2 + E 2 ) (Gi + Ex )(G 0 + AG 0 )x, 


|E*|<(#i + Ts(l+Ai))|Gt|<Y5|Gt| 

Hence 

|y-y| < [(i +y 2 r‘(i +y 3 )(i +y 5 r 2 -1] p£p[|Fi|--- 

Pf_ 2 |F ( _ 2 ||Q_ 1 ||G f _ 2 ||G ( _ 3 |---|Go||x| 

where 


(l+y 2 ) ? 1 (l+73)(l+Y5)' 2 -l<(l+Y 2 ) ? *(1 +Ys) f ‘-1 


Since F k ,C t -\,G k are orthogonal matrices, ||F^|| 2 = ||Q_i|| 2 = 
||G;t|| 2 = 1. By orthogonality of C 77 , ||y|| 2 = ||x|| 2 . Hence 


lly-y|| 2 < 


77 ( 7 - 1 ) 

1 - 77 ( 7 - 1 ) 


l|y|| 2 


□ 


Corollary 4.2. y = C'J x is forward and backward stable. 

Proof The above theorem says that radix 2 DCT-II yields a 
tiny forward error provided that sin ^ and cos ^ are com¬ 
puted stably. It immediately follows that the computation is 
backward stable because y = y + Ay = C 77 x + Ay implies y = 
C 77 (x + Ax) with . If we form y = C 77 x by using 

exactC" then |y-y|<y„ |C"| |x| so ||y-y|| 2 <y„ ||y|| 2 . As 
p is of order u , the C 77 has an error bound smaller than that 
for usual multiplication by the same factor as the reduction in 
complexity of the method, so DCT-II is perfectly stable. □ 


















The error bound of computing recursive DCT-IV algorithm 
can be derived as follows. 


Theorem 4.3. Let y = //(C 7V x), where n = 2* (t > 2), be 
computed using the algorithms \2.2\ , \2.1\ , and assume that 
© holds , then 


lly-y|| 2 < fjf_ 
l|y|| 2 “ i-Y7f' 


(16) 


Proof. Using the algorithms (2.2), (2.1), and the computed 
matrices (in terms of the computed d r ) for k = 0,1, • • • , t — 
2 : 


y = fl (Po U 0 P[ Ui ■ • ■ P, r _ 2 U,_2 Q_1 W ,_ 2 • • ■'WiWo x) 

= p£ (Uo +AUo) • • -pf _ 2 (U,_2 +AU,_ 2 ) (C ; _i +AC,_i) 
(W,-2 + AW f _ 2 ) • • • (Wi + AWi) ( : Wo + AWo)x 


Each is formed containing a combination of matrices In. and U jl 
except Uo = U n . Using the fact that each row in has at most two 
non-zero entries with mostly ones per row: 


|AU^| < 72 \Uk\ for k = 0,l,-..,t-2 


Also each is formed containing a combination of matrices Hn_ 

and Rjl except Wo = R n • Using the fact that each row in has at 

2 k 

most two non-zero entries per row: 

|AW*| <y 3 |w*|, 

for k = 0,1 • • • ,t — 2 


Corollary 4.4. y = C\ x is forward and backward stable. 


Proof. The above theorem says that radix 2 DCT-IV yields 
a tiny forward error provided that sin ^ and cos ^ are com¬ 
puted stably. It immediately follows that the computation is 
backward stable because y = y + Ay = C 1 ^ x + Ay implies y = 
Cf (x + Ax) with . If we form y = C 1 / x by using 

exact Cf, then |y-y| < y„ \C™\ |x| so ||y-y|| 2 < Y« l|y|| 2 - 
As p is of order w, the C 7y has an error bound smaller than that 
for usual multiplication by the same factor as the reduction in 
complexity of the method, so DCT-IV is perfectly stable. □ 


Corollary 4.5. Let y = //(C 777 x), where n = 2ft > 2), be 
computed using the algorithms ( |2.3| ), ( |2.2| ), ( |2.1| ), and as¬ 
sume that © holds , then 


lly-y|| 2 < Y7(f — i) 
l|y|| 2 — i—Yv(? — i)' 


(17) 


Corollary 4.6. y = C'J 1 x is forward and backward stable. 

Finally, the error bound for computing DCT-I algorithm, 
which runs recursively with DCT II-IV algorithms, can be 
derived as follows. 


Theorem 4.7. Let y = //(C 7 +1 x), where n = 2 t {t > 2), be 
computed using the algorithms ( |2.5| ), (2.3), \2.2\ , \2.1\ , and 


assume 


that © holds. Then 


C t -\ is a block diagonal matrix containing C 2 7 and C 2 y hence 

IAC r _! | < 73 \Ct-i | 

Using direct call of computing trigonometric functions i.e. the view 
of ( [14] ), 

W* = W* + AW*, |AWjk| <^|WikU 

Thus, overall 

y = Pj (U 0 + AU 0 ) • • -Pf_ 2 (U,_ 2 + AU ( _ 2 )(Q_i + AQ_i) 

(W;_ 2 + E ? _ 2 ) • • • (W i + Ei) (W 0 + E 0 )x, 


|E*| < (*< + 73(1 +//))|W*| < Ys|W*| 

Hence 

|y-y| < [(i +Y 2 r‘(i +Y3 >(i +y 5 r 1 - 1 ] Po |u 0 |■ • • 
Pf-2 IU.-2I |C f _! I |W,_ 2 | - - - 1 W* I |W 0 | |x| 

where 


(i+Y 2 r 1 (i+Y3)(i+Y 5 r 1 -i 


< 

< 


(l+YsXl+Y?) 

(l+YzZ-l < 


r-l_l 

lit 

1-Y it' 


Since U^,C r _i,W^ are orthogonal matrices, ||U^|| 2 = ||Q_i|| 2 = 
||Wfc|| 2 = 1. By orthogonality of Cjf, ||y|| 2 = ||x|| 2 . Hence 


lit 

1 — lit 


l|y || 2 


□ 


l|y-y|| 2 < w 
lly|| 2 “1-7 it' 


( 18 ) 


Proof. Using the algorithms (2.5), (2.3), ( |2.2[ ), ( |2.1[ ). and the 
computed matrices B k (in terms of the computed d r ) for k = 


2 3 • 


J-2: 


y = fl (Ao Ai • • • A.f —2 C t -i B t -2 • • • B 2 B 1 B 0 x) 

— (Aq + AAq) • • • (A r _2 + AA ? _2) (C ? _i + AQ_i) 

(B^_2 + AB ? _2) • • • (B 2 + AB2)(Bi + ABi)(Bq + ABq)x 


Each A k is formed containing a combination of matrices P£ , 1 , , 

2 k ' 2 k 

H T n_ and U jl except Aq = Pj +1 and Ai = blkdiag . Us¬ 

ing the fact that each row in has at most two non-zero entries 
with mostly ones per row: 


|AA 0 | = 0 , |AAfc| < 72 |A^| for k= 1 , 2 ,*-- ,t — 2 

Also each is formed containing a combination of matrices Hjl + \ , 

2 k 

Hjl , Pjl and R jl except Bq = H n +\ and Bi = blkdiag , P| ^. 

Using the fact that each row in B^ has at most two non-zero entries 
per row: 

|AB 0 | < Y2 |Bo|, |ABi| < Y2 |Bi|, |aS^| < y 3 |S^|, 
for k = 2,3,--* ,t — 2 


lly-y|| 2 < 


















Q_i is a block diagonal matrix containing C 7 , C 2 7 , C 2 77 and C 1 ^ 
hence 

|AC,_!| <y 3 |Q_i.i 

Using direct call of computing trigonometric functions i.e. the view 
of{l4}, 

B^B^ + AB*, |AB^| </u |Bfc|, 

Thus, overall 


The DCT-II and DCT-IV coefficients are then quantized by 
transforming absence of 93.75% of the DCT coefficients 
(93.75% of DCT-II and DCT-IV coefficients in each transfer 
block are set to zero). In each block, the inverse 2D DCT-II 
and DCT-IV coefficients are computed. Finally, putting each 
block back together into a single image leads to Figures [2] 
and|3] 


y = (Ao + AA 0 ) • • • (A ? _2 + AA t -2) (Q_i+AQ_i) 

(B ? _2 + Ej_ 2 ) • • • (B 2 + E 2 )(Bi + ABi)(B 0 + AB 0 )x, 


l E *l < (^+ 73 ( 1 +^))!%! < ys|b*| 

Hence 

|y-y| < [(i+Y2)'(i+Y3)(i+Y5) ? " 3 -i] |a 0 ||a 1 |---|a,_ 2 | 

IQ-il |B ( _ 2 | |B f _31 * - • |B 0 1 |x| 


where 

(l+Y2) i (l+Y3)(l+Y5)'“ 3 -l < (1+Y2)'(1+Y5) ?_2 -1 

< (1+Y7)'-1 


1 — Y7f ‘ 

Since A^,C ? _i,B^ are orthogonal matrices, ||A^|| 2 = ||C ? _i|| 2 = 
||Bfc|| 2 = 1. By orthogonality of C 7 +1 , ||y|| 2 = ||x|| 2 . Hence 

l|y-y|| 2 < fr^llylb 

□ 


Corollary 4.8. y = C 7 +1 x is forward and backward stable. 

Proof The above theorem says that radix 2 DCT-I yields a 
tiny forward error provided that sin ^ and cos ^ are com¬ 
puted stably. It immediately follows that the computation 
is backward stable because y = y + Ay = C 7 +1 x + Ay im- 

plies y = C' +1 (x + Ax) with If we form y = 

C^ +1 x by using exact C* +1 , then |y -y| < y„+i \c[ +l \ |x| so 
l|y-y|| 2 < y^ + i ||y || 2 . As ju is of order u , the C 7 +1 has an error 
bound smaller than that for usual multiplication by the same 
factor as the reduction in complexity of the method, so DCT-I 
is perfectly stable. □ 


Figure [2] shows images with discarded coefficients (ex¬ 
cept the top left 6.25% in each transfer block) in each 
transfer block, after applying DCT-II algorithm, and then 
running recursively with the DCT-IV algorithm. 



Figure 2: (|2aj) Original Lena Image (|2b| Reconstructed im¬ 
age with 93.75% discarded DCT-II coefficients in each 8x8 
transfer block (2c) Reconstructed image with 93.75% dis¬ 
carded DCT-II coefficients in each 16 x 16 transfer block 


( 2d| Reconstructed image with 93.75% discarded DCT-II co¬ 
efficients in each 32 x 32 transfer block 


5 IMAGE COMPRESSION RESULTS 
BASED ON DCT ALGORITHMS 

Discretized images can be considered as matrices. To 
compress such images one can apply the quantization 
technique. In this section we use the quantization technique 
with the help of recursive DCT-II and DCT-IV algorithms 
to compress the Lena image of size 512 x 512 pixels. 
At first, the image is discretized into 8x8, 16 x 16, and 
32 x 32 transfer blocks. Next, using the recursive DCT-II and 
DCT-IV algorithms, 2D-DCTs are computed for each block. 


Figure [3] shows images with discarded coefficients (ex¬ 
cept the top left 6.25% in each transfer block) in each 
transfer block after applying DCT-IV algorithm and then 
running recursively with the DCT-II algorithm. 

Comparing to Figures [2] and [3j the image reconstruction 
results corresponding to DCT-II algorithm are better than 
that of the DCT-IV algorithm. Though the quality of re¬ 
constructed images in Figures [2] and [3] are somewhat lost, 
those images are clearly recognizable even though 93.75% 
















the left to the right. These signal flow graphs are correspond¬ 
ing to the decimation-in-frequency algorithms. However one 
can convert these decimation-in-frequency DCT algorithms 
into decimation-in-time DCT algorithms. In each Figure ([5j[6] 
I 7 J andpl), 8 := ^=, Qj := cos 0, and Sij = sin |j. As shown 



Figure 3: <[3aj) Original Lena Image (|3bj) Reconstructed im¬ 
age with 93.75% discarded DCT-IV coefficients in each 8x8 
transfer block pc] ) Reconstructed image with 93.75% dis¬ 
carded DCT-IV coefficients in each 16 x 16 transfer block 


( 3d| Reconstructed image with 93.75% discarded DCT-IV 
coefficients in each 32 x 32 transfer block 


of the DCT-II and DCT-IV coefficients are discarded in each 
transfer block. 


6 SIGNAL FLOW GRAPHS FOR DCT AL¬ 
GORITHMS 

Signal flow graphs commonly represent the realization of 
systems such as electronic devices in electrical engineering, 
control theory, system engineering, theoretical computer 
science, etc. Simply put, the objective is to build a device 
to implement or realize an algorithm, using devices that 
implement the algebraic operations used in these recursive 
algorithms. These building blocks are shown next in Figure 
[4] This section presents signal flow graphs for 9-point DCT-I 


x(l)- 


Adder 

x(0) 

1 


>x(0)+x(l) 


Gain 


x(0) - 2 —>a X (0) 


x(0> 


Splitter 

x(0) 


>x(0) 


Figure 4: Signal flow graphs building blocks 


and 8-point DCT II-IV algorithms via Figures [5] [6] [7] an d0 
As shown in the flow graphs, in each graph signal flows from 





























































































































































































































[5] 

[ 6 ] 

[7] 

[ 8 ] 
[9] 

[ 10 ] 

[ 11 ] 


in the Figures [6] [7J and[8j the input signals are in order: x = [12] 

{v(0),x(l), • • • ,x(7)} and output signals are in bit-reversed 
order: y = {y(0),y(4),};(2),y(6),3;(l),};(5),};(3),3;(7)}. 

In bit-reversed order, each output index is represented [13] 

as a binary number and the indices’ bits are reversed. 

Say for 8-point DCT II, the sequential order of the input 
indices’ bits is {000,001,010,011,100,101,110,111} ^ 14 j 

then reversing these input signal bits yields 
{000,100,010,110,001,101,011,111} which is the output 
signal. ^ 


7 CONCLUSION m 

This paper provided stable, completely recursive, radix- 
2 DCT-I and DCT-III algorithms having sparse, orthogonal [nj 
and rotation/rotation-reflection matrices, defined solely via 
DCT I-IV algorithms. The arithmetic cost and error bounds 
of computing DCT I-IV algorithms are addressed. Using the ^ 
recursive DCT-II and DCT-IV algorithms with the absence of 
93.75% coefficients in each transfer block in 2D DCT-II and 
DCT-IV, one can reconstruct 512 x 512 images without seri- 
ously affecting the quality. Signal flow graphs are presented 
for these solely based orthogonal factorization of DCT I-IV ^0] 
in decimation-of-frequency. 
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